Audio-visual sensor fusion with neural architectures
نویسندگان
چکیده
In this paper we present a new word recognition system for monosyllabic words consisting of two types of neural networks which allows in an easy way the investigation of three different fusion architectures for audio-visual signals. Furthermore, two different kinds of preprocessing are compared: Besides low level data, a linear discriminant analysis is used for the audio and visual signals to reduce the dimensionality. Our cross-validation experiments show a slight advantage for an intermediate fusion model compared with an early fusion model which uses jointly preprocessed audio and visual data.
منابع مشابه
Multi-Focus Image Fusion in DCT Domain using Variance and Energy of Laplacian and Correlation Coefficient for Visual Sensor Networks
The purpose of multi-focus image fusion is gathering the essential information and the focused parts from the input multi-focus images into a single image. These multi-focus images are captured with different depths of focus of cameras. A lot of multi-focus image fusion techniques have been introduced using considering the focus measurement in the spatial domain. However, the multi-focus image ...
متن کاملFusion of Multi-Sensor Imagery for Night Vision: Color Visualization, Target Learning and Search
1 This work was sponsored by the U.S. Defense Advanced Research Projects Agency, under Air Force Contract F19628-95-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the authors and not necessarily endorsed by the U.S. Air Force. 2 Present Address: MCIS Department, Jacksonville State University, Jacksonville, AL 36265, U.S.A. Abstract We present methods and result...
متن کاملResource aware design of a deep convolutional-recurrent neural network for speech recognition through audio-visual sensor fusion
Today’s Automatic Speech Recognition systems only rely on acoustic signals and often don’t perform well under noisy conditions. Performing multi-modal speech recognition processing acoustic speech signals and lip-reading video simultaneously significantly enhances the performance of such systems, especially in noisy environments. This work presents the design of such an audio-visual system for ...
متن کاملSensor Fusion Weighting Measures in Audio-Visual Speech Recognition
Audio-Visual Speech Recognition (AVSR) uses vision to enhance speech recognition but also introduces the problem of how to join (or fuse) these two signals together. Mainstream research achieves this using a weighted product of the output of the phoneme classifiers for both modalities. This paper analyses current weighting measures and compares them to several new measures proposed by the autho...
متن کاملSensor Fusion for Mobile Robot Navigation - Proceedings of the IEEE
We review techniques for sensor fusion in robot navigation, emphasizing algorithms for self-location. These find use when the sensor suite of a mobile robot comprises several different sensors, some complementary and some redundant. Integrating the sensor readings, the robot seeks to accomplish tasks such as constructing a map of its environment, locating itself in that map, and recognizing obj...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999